Computing equilibria in two–player zero–sum continuous stochastic games with switching controller
نویسندگان
چکیده
Equilibrium computation with continuous games is currently a challenging open task in artificial intelligence. In this paper, we design an iterative algorithm that finds an ǫ–approximate Markov perfect equilibrium with two–player zero–sum continuous stochastic games with switching controller. When the game is polynomial (i.e., utility and state transitions are polynomial functions), our algorithm converges to ǫ = 0 by exploiting semidefinite programming. When the game is not polynomial, the algorithm exploits polynomial approximations and converges to an ǫ value whose upper bound is a function of the maximum approximation error with infinity norm. To our knowledge, this is the first algorithm for equilibrium approximation with arbitrary utility and transition functions providing theoretical guarantees. The algorithm is also empirically evaluated. Introduction The computation of game–theoretic solutions is a central task in artificial intelligence (Shoham and Leyton-Brown 2010). Game theory provides the most elegant game models and solution concepts, but it leaves the problem to compute a solution open (Fudenberg and Tirole 1991). A game is a pair: a mechanism (specifying the rules) and the strategies (specifying the agents’ behavior). The central solution concept is the Nash equilibrium (NE). Every finite game is guaranteed to have at least one NE in mixed strategies, but its computation is PPAD–complete even with just two agents (Chen, Deng, and Teng 2009). Instead, with two agents and zero– sum utilities the problem is in P. A challenging game class is composed by continuous games (Karlin 1951), in which the actions of the agents are real values. These games are very common in practical scenarios (e.g., auctions, bargaining, dynamic games). Differently from finite games, continuous games may not admit NEs, e.g., due to discontinuity of the utility functions (Rosen 1987). With compact and convex action spaces and continuous utility functions, a slight variation of the Kakutani’s fixed point theorem assures the existence of at least one NE in mixed strategies (Glicksberg 1952). Adding the hypothesis of concave utility functions, continuous games alCopyright c © 2013, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. ways admit equilibria in pure strategies, utility functions being equivalent to the convexification of a finite set of actions (Rosen 1987). However, in general settings, NEs are in mixed strategies and their study is hard. A very special class is that of separable games where each agent’s payoff is a finite sum of products of functions in each player’s strategy separately (e.g. as polynomials). It is known that every separable game admits an NE finitely supported. This has been shown with zero–sum games in (Dresher, Melvin and Shapley 1950), and recently with general–sum games in (Stein, Ozdaglar, and Parrilo 2008). Instead, when games are not separable, NEs may be not finitely supported (Karlin 1959). Few computational results are known for continuous games. For the special case of two agents, zero–sum, polynomial utility functions, the computation of an NE can be formulated (Parrilo 2006) as a pair (primal/dual) of semidefinite programming problems (SDP), that can be efficiently solved by convex programming algorithms (Blekherman, Parrilo, and Thomas 2012). With general separable utility functions, non–linear programming tools should be used, without any guarantee of finding the optimal solution and with algorithms that hardly scale (to the best of our knowledge, no work in the literature has dealt with this problem). With arbitrary (non–separable) utility functions, some works deal with the problem of approximate a pure strategy equilibrium by exploiting local search and Monte Carlo methods with normal–form games (Vorobeychik and Wellman 2008) and extensive–form games (Gatti and Restelli 2011), but no work deals with the problem of finding mixed strategy NEs. In this paper, we focus on continuous stochastic games with switching control. The literature only provides results for finite stochastic games with switching control (Vrieze et al. 1983) and for polynomial continuous stochastic games with single controller (Shah and Parrilo 2007). Our original contributions include the following. • We provide an iterative SDP based algorithm that converges to a Markov perfect equilibrium (MPE) — the appropriate solution concept for stochastic games — when both the reward and the state transitions are polynomial functions and returns an ǫ–approximate MPE (ǫ–MPE) with ǫ ≤ ǫ, where ǫ is given as input. • We use our algorithm to approximate solutions of non– polynomial games: approximating the reward and state transitions given as input with polynomials, solving the approximated game, and providing theoretical upper bounds on the quality of the solutions as functions of the approximation error with infinity norm. • We experimentally evaluate the performance of our algorithm in terms of iterations and compute time. Game model and solution concepts A two–player zero–sum stochastic game G, introduced in (Shapley 1953), is a tuple (N,S,X, P,R, γ), where: S is a finite set of states (s ∈ S denotes a generic state), N is the set of agents (i ∈ N denotes a generic agent), X is the set of actions (Xi,s ⊆ X denotes the set of actions available to agent i at state s, and xi ∈ Xi,s a generic action of agent i), P is a set of maps ps,s′ : ×i∈NXi,s → [0, 1] assigning the probability to move from state s to state s given the actions of all the agents, R is a set of maps rs : ×i∈NXi,s → R assigning each action profile a reward, γ ∈ (0, 1) is the temporal discount factor. Without loss of generality, we assume that agent 1 is the max agent. We denote with us, where s ∈ S, the utility function of agent 1, while −us is the utility function of agent 2. By Bellman equation, us is defined as us(x1,x2) = rs(x1,s, x2,s)+ ∑ s′ ps,s′(x1,s, x2,s) · us′(x1,x2), where xi is the vector of actions of agent i over all the states and xi,s is the specific action of agent i at state s. A continuous stochastic game is a stochastic game (N,S,X, P,R, γ) where action spaces Xi,s are limited subspaces (usually compact) of the Euclidean space and maps ps,s′ and rs are generic functions. We focus on continuous stochastic games with switching controller, in which at each state s functions ps,s′ depend either on x1 ∈ X1,s or on x2 ∈ X2,s. The agent i who drives the transition at state s is said the controller of such state and it is denoted by cs. Notice that different states can be controlled by different agents. We partition the state space S as S = S1 ∪ S2 where Si is the set of states where cs = i. When the game is polynomial, we have: ps,s′(xcs) = ∑m k=0 ps,s′,k · (xcs) k and rs(x1, x2) = ∑m k=0 ∑m j=0 rs,k,j · (x1) k · (x2) j , where m is the maximum degree of all the polynomials and ps,s′,k, rs,k,j ∈ R are coefficients. A strategy profile (σ1,σ2) specifies the strategy σi,s of each agent i at each state s as a probability measure over the space of actions Xi,s. An MPE is a strategy profile (σ1,σ ∗ 2) where each strategy is conditioned only on the local state of the agent, and such that no agent can improve her utility us (or −us) in any state s by changing her strategy. With zero–sum games, an MPE corresponds to maxmin/minmax strategies. In this paper, we resort also to the ǫ–MPE concept, defined as: a strategy profile (σ1,σ2) is an ǫ–MPE if no agent can improve her utility us (or −us) in some state s more than ǫ by changing her strategy. Obviously, an ǫ–MPE with ǫ = 0 is an MPE. Furthermore, while an MPE may not exist with continuous games, it is always possible to find ǫ–MPEs for some ǫ. Algorithm 1 Iterative Nash approximation 1: assign ûs = 0 for every s ∈ S 2: repeat 3: [û,σ1 2 ] = solve PS1(û) 4: [û,σ1 1 ] = solve DS1(û) 5: [û,σ2 1 ] = solve PS2(û) 6: [û,σ2 2 ] = solve DS2(û) 7: assign σ1 = (σ S1 1 ,σ S2 1 ) and σ2 = (σ S1 2 ,σ S2 2 ) 8: calculate u with (σ1,σ2) 9: u1 = solve BR1 with σ2 = (σ S1 2 ,σ S2 2 ) 10: u2 = solve BR2 with σ1 = (σ S1 1 ,σ S2 1 ) 11: until max{‖u1 − u‖∞, ‖u− u ∗ 2‖∞} ≤ ǫ Equilibrium computation with polynomial games Here, we describe the algorithm converging to an MPE with polynomial games. For the sake of clarity, we initially describe the algorithm omitting details on the SDPs the algorithm uses. Details are provided later. Algorithm The procedure is summarized in Algorithm 1. The algorithm uses auxiliary utilities û. As shown below, these utilities converge to the utilities at the equilibrium, denoted by u. Initially, (Steps 1 and 2) the algorithm initializes ûs = 0 for every s ∈ S. Then, the algorithm repeats Steps 3–11 until an ǫ–MPE has been found where ǫ is given as input. At first, the algorithm finds the optimal strategies in the states S1 controlled by agent 1 when the utilities of the states s ∈ S2 are fixed to ûs, and assigns the returned optimal utility values of states s ∈ S1 to ûs. This is accomplished into two steps: in Step 3, the optimal strategy of agent 2 is computed by solving an SDP called PS1, while in Step 4, the optimal strategy of agent 1 is computed by solving an SDP called DS1. PS1 is the primal problem, while DS1 is the dual problem. (As we will discuss in the following section, strong duality holds for these two problems.) The problem PS1:
منابع مشابه
Computing Uniformly Optimal Strategies in Two-Player Stochastic Games
We provide a computable algorithm to calculate uniform ε-optimal strategies in two-player zero-sum stochastic games. Our approach can be used to construct algorithms that calculate uniform ε-equilibria and uniform correlated ε-equilibria in various classes of multi-player non-zero-sum stochastic games. JEL codes: C63, C73.
متن کاملComputing Equilibria with Two-Player Zero-Sum Continuous Stochastic Games with Switching Controller
Equilibrium computation with continuous games is currently a challenging open task in artificial intelligence. In this paper, we design an iterative algorithm that finds an –approximate Markov perfect equilibrium with two–player zero–sum continuous stochastic games with switching controller. When the game is polynomial (i.e., utility and state transitions are polynomial functions), our algorith...
متن کاملA mathematical programming based characterization of Nash equilibria of some constrained stochastic games
We consider two classes of constrained finite state-action stochastic games. First, we consider a two player nonzero sum single controller constrained stochastic game with both average and discounted cost criterion. We consider the same type of constraints as in [1], i.e., player 1 has subscription based constraints and player 2, who controls the transition probabilities, has realization based ...
متن کاملA TRANSITION FROM TWO-PERSON ZERO-SUM GAMES TO COOPERATIVE GAMES WITH FUZZY PAYOFFS
In this paper, we deal with games with fuzzy payoffs. We proved that players who are playing a zero-sum game with fuzzy payoffs against Nature are able to increase their joint payoff, and hence their individual payoffs by cooperating. It is shown that, a cooperative game with the fuzzy characteristic function can be constructed via the optimal game values of the zero-sum games with fuzzy payoff...
متن کاملTwo Player Non Zero-sum Stopping Games in Discrete Time
We prove that every two player non zero-sum stopping game in discrete time admits an -equilibrium in randomized strategies, for every > 0. We use a stochastic variation of Ramsey Theorem, which enables us to reduce the problem to that of studying properties of -equilibria in a simple class of stochastic games with finite state space.
متن کامل